Overview

Dataset statistics

Number of variables11
Number of observations1109656
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory93.1 MiB
Average record size in memory88.0 B

Variable types

NUM11

Reproduction

Analysis started2020-07-30 22:45:00.990356
Analysis finished2020-07-30 22:48:32.306469
Duration3 minutes and 31.32 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

EQI_zip is highly skewed (γ1 = 57.40680154) Skewed
RECPI_zip is highly skewed (γ1 = 25.46036202) Skewed
EQI_MSA is highly skewed (γ1 = 36.11887258) Skewed

Variables

year
Real number (ℝ≥0)

Distinct count29
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2002.0
Minimum1988
Maximum2016
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum1988
5-th percentile1989
Q11995
median2002
Q32009
95-th percentile2015
Maximum2016
Range28
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.366604035
Coefficient of variation (CV)0.004179122895
Kurtosis-1.202857156
Mean2002
Median Absolute Deviation (MAD)7
Skewness0
Sum2221531312
Variance70.00006308
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2016382643.4%
 
2001382643.4%
 
1989382643.4%
 
1990382643.4%
 
1991382643.4%
 
1992382643.4%
 
1993382643.4%
 
1994382643.4%
 
1995382643.4%
 
1996382643.4%
 
Other values (19)72701665.5%
 
ValueCountFrequency (%) 
1988382643.4%
 
1989382643.4%
 
1990382643.4%
 
1991382643.4%
 
1992382643.4%
 
ValueCountFrequency (%) 
2016382643.4%
 
2015382643.4%
 
2014382643.4%
 
2013382643.4%
 
2012382643.4%
 

zipcode
Real number (ℝ≥0)

Distinct count38264
Unique (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49980.3696424838
Minimum501
Maximum99929
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum501
5-th percentile6349
Q127309
median49331.5
Q373056.25
95-th percentile95467
Maximum99929
Range99428
Interquartile range (IQR)45747.25

Descriptive statistics

Standard deviation27726.5519
Coefficient of variation (CV)0.5547488363
Kurtosis-1.108009529
Mean49980.36964
Median Absolute Deviation (MAD)23105
Skewness0.05135383944
Sum5.546101706e+10
Variance768761680.1
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
204729< 0.1%
 
9763429< 0.1%
 
3414129< 0.1%
 
5461529< 0.1%
 
5666229< 0.1%
 
5051729< 0.1%
 
6280329< 0.1%
 
6485029< 0.1%
 
5870529< 0.1%
 
545529< 0.1%
 
Other values (38254)1109366> 99.9%
 
ValueCountFrequency (%) 
50129< 0.1%
 
100129< 0.1%
 
100229< 0.1%
 
100329< 0.1%
 
100429< 0.1%
 
ValueCountFrequency (%) 
9992929< 0.1%
 
9992829< 0.1%
 
9992729< 0.1%
 
9992629< 0.1%
 
9992529< 0.1%
 

EQI_zip
Real number (ℝ≥0)

SKEWED

Distinct count754520
Unique (%)68.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0004725326979767446
Minimum5.6644253e-06
Maximum0.3660055999999999
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum5.6644253e-06
5-th percentile6.3407526e-05
Q10.000148677975
median0.00025794316
Q30.00043992483
95-th percentile0.0012042441
Maximum0.3660056
Range0.3659999356
Interquartile range (IQR)0.000291246855

Descriptive statistics

Standard deviation0.00170852091
Coefficient of variation (CV)3.615667057
Kurtosis6978.528196
Mean0.000472532698
Median Absolute Deviation (MAD)0.00012941556
Skewness57.40680154
Sum524.3487435
Variance2.919043699e-06
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.0003517876622680.2%
 
0.000503955621680.2%
 
0.000649719816680.2%
 
0.0001653969116040.1%
 
0.0001797907215820.1%
 
0.000491000415650.1%
 
0.001638096114820.1%
 
0.0002308663813360.1%
 
0.0001578963413330.1%
 
0.0002203975212770.1%
 
Other values (754510)109337398.5%
 
ValueCountFrequency (%) 
5.6644253e-0628< 0.1%
 
5.6667004e-0627< 0.1%
 
5.7531083e-062< 0.1%
 
6.0264424e-062< 0.1%
 
6.440275e-0641< 0.1%
 
ValueCountFrequency (%) 
0.36600561< 0.1%
 
0.325375971< 0.1%
 
0.305600671< 0.1%
 
0.282398851< 0.1%
 
0.225373431< 0.1%
 

SFR_zip
Real number (ℝ≥0)

Distinct count2120
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.757130137628238
Minimum1.0
Maximum6883.0
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median3.333333333
Q319
95-th percentile163
Maximum6883
Range6882
Interquartile range (IQR)18

Descriptive statistics

Standard deviation88.68840254
Coefficient of variation (CV)2.792708351
Kurtosis170.9534565
Mean31.75713014
Median Absolute Deviation (MAD)2.333333333
Skewness8.469755335
Sum35239490
Variance7865.632745
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
131708328.6%
 
211247510.1%
 
3657205.9%
 
4465654.2%
 
5352603.2%
 
6286222.6%
 
7237052.1%
 
8199751.8%
 
9174841.6%
 
10153851.4%
 
Other values (2110)42738238.5%
 
ValueCountFrequency (%) 
131708328.6%
 
1.0384615382< 0.1%
 
1.041< 0.1%
 
1.0416666671< 0.1%
 
1.0434782612< 0.1%
 
ValueCountFrequency (%) 
68831< 0.1%
 
58581< 0.1%
 
43071< 0.1%
 
42011< 0.1%
 
41371< 0.1%
 

RECPI_zip
Real number (ℝ≥0)

SKEWED

Distinct count764504
Unique (%)68.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.017100710044509703
Minimum5.6644253e-06
Maximum9.541773
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum5.6644253e-06
5-th percentile8.054863e-05
Q10.00030280277
median0.001045424087
Q30.006675634675
95-th percentile0.0723692975
Maximum9.541773
Range9.541767336
Interquartile range (IQR)0.006372831905

Descriptive statistics

Standard deviation0.0879202355
Coefficient of variation (CV)5.141320757
Kurtosis1248.442109
Mean0.01710071004
Median Absolute Deviation (MAD)0.0009196725175
Skewness25.46036202
Sum18975.90551
Variance0.007729967811
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.0003517876621450.2%
 
0.000503955620170.2%
 
0.000649719815740.1%
 
0.0001797907215100.1%
 
0.0001653969115070.1%
 
0.000491000414780.1%
 
0.001638096113620.1%
 
0.0001578963412870.1%
 
0.0002308663812190.1%
 
0.0002203975211820.1%
 
Other values (764494)109437598.6%
 
ValueCountFrequency (%) 
5.6644253e-0628< 0.1%
 
5.6667004e-0627< 0.1%
 
5.7531083e-062< 0.1%
 
6.0264424e-062< 0.1%
 
6.440275e-0641< 0.1%
 
ValueCountFrequency (%) 
9.5417731< 0.1%
 
9.1737291< 0.1%
 
8.4193931< 0.1%
 
7.4585361< 0.1%
 
7.41820961< 0.1%
 

EQI_MSA
Real number (ℝ≥0)

SKEWED

Distinct count324109
Unique (%)29.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0005084221387123552
Minimum1.1497502e-05
Maximum0.1535093703
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum1.1497502e-05
5-th percentile0.0001150341
Q10.0002171501586
median0.0003395700525
Q30.000528607785
95-th percentile0.00143251701
Maximum0.1535093703
Range0.1534978728
Interquartile range (IQR)0.0003114576264

Descriptive statistics

Standard deviation0.0008826023985
Coefficient of variation (CV)1.735963742
Kurtosis3249.737802
Mean0.0005084221387
Median Absolute Deviation (MAD)0.0001404248125
Skewness36.11887258
Sum564.1736768
Variance7.789869938e-07
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000646814710540.1%
 
0.000442743339170.1%
 
0.000220862259170.1%
 
0.000412599249170.1%
 
0.00028703319170.1%
 
0.00038580689170.1%
 
0.000280664249170.1%
 
0.000207556489170.1%
 
0.000433784979170.1%
 
0.000455087759170.1%
 
Other values (324099)110034999.2%
 
ValueCountFrequency (%) 
1.1497502e-051< 0.1%
 
1.41354085e-0512< 0.1%
 
1.7317287e-057< 0.1%
 
1.8140032e-054< 0.1%
 
2.0291376e-051< 0.1%
 
ValueCountFrequency (%) 
0.15350937031< 0.1%
 
0.14156009721< 0.1%
 
0.11397314821< 0.1%
 
0.11006544431< 0.1%
 
0.10243803011< 0.1%
 

SFR_MSA
Real number (ℝ≥0)

Distinct count83153
Unique (%)7.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11655.35467771093
Minimum1.0
Maximum165622.0
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum1
5-th percentile37
Q1340
median2775.583333
Q313295
95-th percentile55456
Maximum165622
Range165621
Interquartile range (IQR)12955

Descriptive statistics

Standard deviation20520.71049
Coefficient of variation (CV)1.760625143
Kurtosis11.90652233
Mean11655.35468
Median Absolute Deviation (MAD)2693.916667
Skewness3.068453379
Sum1.293343425e+10
Variance421099559.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
175150.7%
 
219290.2%
 
5031818340.2%
 
4217220.2%
 
1416960.2%
 
516850.2%
 
1015400.1%
 
2115350.1%
 
415280.1%
 
6915270.1%
 
Other values (83143)108714598.0%
 
ValueCountFrequency (%) 
175150.7%
 
1.25< 0.1%
 
1.3333333331< 0.1%
 
1.33333333322< 0.1%
 
1.45< 0.1%
 
ValueCountFrequency (%) 
1656221< 0.1%
 
165573.51< 0.1%
 
165530.51< 0.1%
 
1654621< 0.1%
 
165426.51< 0.1%
 

RECPI_MSA
Real number (ℝ≥0)

Distinct count327130
Unique (%)29.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.114618631919836
Minimum1.1497502e-05
Maximum223.24586300000004
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum1.1497502e-05
5-th percentile0.010554047
Q10.10644382
median0.8754009302
Q36.194858729
95-th percentile35.49057687
Maximum223.245863
Range223.2458515
Interquartile range (IQR)6.088414909

Descriptive statistics

Standard deviation22.52159884
Coefficient of variation (CV)2.77543528
Kurtosis35.94229434
Mean8.114618632
Median Absolute Deviation (MAD)0.8577554892
Skewness5.544674165
Sum9004435.253
Variance507.2224142
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000646814710540.1%
 
13.1221369170.1%
 
13.043859170.1%
 
11.0208699170.1%
 
13.5298899170.1%
 
13.9825689170.1%
 
13.3978159170.1%
 
10.1661279170.1%
 
16.7275059170.1%
 
13.333589170.1%
 
Other values (327120)110034999.2%
 
ValueCountFrequency (%) 
1.1497502e-051< 0.1%
 
1.7317287e-057< 0.1%
 
1.8140032e-054< 0.1%
 
2.0291376e-051< 0.1%
 
2.0864842e-0514< 0.1%
 
ValueCountFrequency (%) 
223.2458631< 0.1%
 
222.79090331< 0.1%
 
222.45415151< 0.1%
 
222.43284841< 0.1%
 
222.38811991< 0.1%
 

EQI_state
Real number (ℝ≥0)

Distinct count35824
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0005543055481338894
Minimum7.426358e-05
Maximum0.0037453347
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum7.426358e-05
5-th percentile0.00014480183
Q10.00026605063
median0.00040966412
Q30.0005995629
95-th percentile0.0018229785
Maximum0.0037453347
Range0.00367107112
Interquartile range (IQR)0.00033351227

Descriptive statistics

Standard deviation0.0004874645425
Coefficient of variation (CV)0.8794148716
Kurtosis5.678371162
Mean0.0005543055481
Median Absolute Deviation (MAD)0.0001568573825
Skewness2.323661262
Sum615.0884773
Variance2.376216802e-07
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000382958627970.3%
 
0.001727586623920.2%
 
0.000654616523890.2%
 
0.002042266523490.2%
 
0.000760391823460.2%
 
0.001031349823260.2%
 
0.001957737423260.2%
 
0.001724851223170.2%
 
0.00214274623170.2%
 
0.001714916723150.2%
 
Other values (35814)108578297.8%
 
ValueCountFrequency (%) 
7.426358e-056980.1%
 
7.625624e-0530< 0.1%
 
7.660029333e-0511< 0.1%
 
7.660762583e-055< 0.1%
 
7.679038286e-051< 0.1%
 
ValueCountFrequency (%) 
0.00374533476200.1%
 
0.0035562535251< 0.1%
 
0.0035317395671< 0.1%
 
0.0034697986711< 0.1%
 
0.0034399918173< 0.1%
 

SFR_state
Real number (ℝ≥0)

Distinct count34851
Unique (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44199.77196491525
Minimum40.0
Maximum330536.0
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum40
5-th percentile3167.707692
Q111527
median27032
Q357480
95-th percentile144758
Maximum330536
Range330496
Interquartile range (IQR)45953

Descriptive statistics

Standard deviation50037.81531
Coefficient of variation (CV)1.13208311
Kurtosis6.736711374
Mean44199.77196
Median Absolute Deviation (MAD)18608.33333
Skewness2.332689413
Sum4.904654216e+10
Variance2503782961
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11362336110.3%
 
7371627970.3%
 
1867525130.2%
 
17122223920.2%
 
11962623890.2%
 
16897723490.2%
 
9218723460.2%
 
15600723260.2%
 
5760123260.2%
 
18574123170.2%
 
Other values (34841)108429097.7%
 
ValueCountFrequency (%) 
40211< 0.1%
 
63148< 0.1%
 
65117< 0.1%
 
6716< 0.1%
 
71150< 0.1%
 
ValueCountFrequency (%) 
33053612860.1%
 
322793.66671< 0.1%
 
315234.14291< 0.1%
 
315051.33331< 0.1%
 
3146726< 0.1%
 

RECPI_state
Real number (ℝ≥0)

Distinct count35822
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.203583309878322
Minimum0.01783298
Maximum442.21994000000007
Zeros0
Zeros (%)0.0%
Memory size8.5 MiB

Quantile statistics

Minimum0.01783298
5-th percentile0.67425174
Q14.184579
median11.849025
Q326.99141
95-th percentile96.0401
Maximum442.21994
Range442.202107
Interquartile range (IQR)22.806831

Descriptive statistics

Standard deviation59.33080148
Coefficient of variation (CV)1.964362999
Kurtosis18.82629165
Mean30.20358331
Median Absolute Deviation (MAD)9.3076
Skewness4.114851461
Sum33515587.44
Variance3520.144005
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
28.23017527970.3%
 
295.8008423920.2%
 
78.3091623890.2%
 
345.0960723490.2%
 
70.0982423460.2%
 
305.4207523260.2%
 
59.40677623260.2%
 
268.4679323170.2%
 
397.995823170.2%
 
223.5479623150.2%
 
Other values (35812)108578297.8%
 
ValueCountFrequency (%) 
0.0178329858< 0.1%
 
0.0225168951< 0.1%
 
0.022712516672< 0.1%
 
0.0235474881< 0.1%
 
0.023638998751< 0.1%
 
ValueCountFrequency (%) 
442.2199422690.2%
 
425.91115561< 0.1%
 
425.0872042< 0.1%
 
424.6443222530.2%
 
424.17522891< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

yearzipcodeEQI_zipSFR_zipRECPI_zipEQI_MSASFR_MSARECPI_MSAEQI_stateSFR_stateRECPI_state
0198810010.00081548.00.0391080.0010211235.01.2608880.00147617558.025.921940
1198910010.00111644.00.0491000.0011681049.01.2253840.00175115343.026.866861
2199010010.00162945.00.0733170.001243841.01.0451610.00185713556.025.172453
3199110010.00082627.00.0222980.001375714.00.9817240.00182312798.023.330479
4199210010.00221622.00.0487440.001549760.01.1768770.00211113289.028.052156
5199310010.00082727.00.0223340.001266824.01.0432050.00205714110.029.028145
6199410010.00477233.00.1574890.001394804.01.1211050.00199614843.029.624968
7199510010.00106523.00.0244960.001667817.01.3623250.00202315180.030.703022
8199610010.00197129.00.0571660.001361884.01.2032230.00232616520.038.430626
9199710010.00084042.00.0352830.001233902.01.1120230.00238617254.041.174500

Last rows

yearzipcodeEQI_zipSFR_zipRECPI_zipEQI_MSASFR_MSARECPI_MSAEQI_stateSFR_stateRECPI_state
11096462007996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096472008996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096482009996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096492010996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096502011996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096512012996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096522013996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096532014996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096542015996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002
11096552016996260.0000181.00.0000180.000051924.00.157510.0000823847.00.315002